Wp-dyna: Planning and Reinforcement Learning in Well-plannable Environments
نویسنده
چکیده
Reinforcement learning (RL) involves sequential decision making in uncertain environments. The aim of the decision-making agent is to maximize the benefit of acting in its environment over an extended period of time. Finding an optimal policy in RL may be very slow. To speed up learning, one often used solution is the integration of planning, for example, Sutton’s Dyna algorithm, or various other methods using macro-actions. Here we suggest to separate plannable, i.e., close to deterministic parts of the world, and focus planning efforts in this domain. A novel reinforcement learning method called WP-Dyna is proposed here. WP-Dyna builds a simple model, which is used to search for macro actions. The simplicity of the model makes planning computationally inexpensive. It is shown that WP-Dyna finds an optimal policy, and that plannable macro actions found by WP-Dyna are near-optimal. In turn, it is unnecessary to try large numbers of macro actions, which enables fast learning. The utility of WP-Dyna is demonstrated by computer simulations.
منابع مشابه
Searching for Plannable Domains can Speed up Reinforcement Learning
Reinforcement learning (RL) involves sequential decision making in uncertain environments. The aim of the decision-making agent is to maximize the benefit of acting in its environment over an extended period of time. Finding an optimal policy in RL may be very slow. To speed up learning, one often used solution is the integration of planning, for example, Sutton’s Dyna algorithm, or various oth...
متن کاملIntegrated Architectures for Learning , Planning , and ReactingBased
This paper extends previous work with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods. Dyna architectures integrate trial-and-error (reinforcement) learning and execution-time planning into a single process operating alternately on the world and on a learned model of the world. In this paper, I present and show results for two Dyna archi...
متن کاملIntegrated Modeling and Control Based on Reinforcement Learning
This is a summary of results with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods. Dyna architectures integrate trial-and-error (reinforcement) learning and execution-time planning into a single process operating alternately on the world and on a learned forward model of the world. We describe and show results for two Dyna architectures,...
متن کاملPlanning with neural networks and reinforcement learning
planning with neural networks, time limits of discounted reinforcement learning Planning, taskability, Dyna-PI architectures Dyna-PI architectures: focussing, forward and backward planning, acting and (re)planning. Tested with... Ideas from problem solving and
متن کاملReinforcement Learning with a Hierarchy of Abstract Models
Reinforcement learning (RL) algorithms have traditionally been thought of as trial and error learning methods that use actual control experience to incrementally improve a control policy. Sutton's DYNA architecture demonstrated that RL algorithms can work as well using simulated experience from an environment model, and that the resulting computation was similar to doing one-step lookahead plan...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006